Skip to content

NXP Backend: Add imxrt700cm backend which combines the Neutron and CortexM backends#18488

Open
MartinPavella wants to merge 5 commits intopytorch:mainfrom
nxp-upstream:nxg01483/EIEX-762-create-the-aot-part-of-imxrt700cm-backend-combining-neutron-and-cortex-m
Open

NXP Backend: Add imxrt700cm backend which combines the Neutron and CortexM backends#18488
MartinPavella wants to merge 5 commits intopytorch:mainfrom
nxp-upstream:nxg01483/EIEX-762-create-the-aot-part-of-imxrt700cm-backend-combining-neutron-and-cortex-m

Conversation

@MartinPavella
Copy link
Collaborator

@MartinPavella MartinPavella commented Mar 25, 2026

Summary

Add imxrt700cm backend which combines the Neutron and CortexM backends into one. The backend uses Neutron wherever possible, and the leftover nodes are handled by Cortex-M.

Test plan

Unit tests provided

cc @robert-kalmar @JakeStevens @digantdesai

@MartinPavella MartinPavella self-assigned this Mar 25, 2026
@MartinPavella MartinPavella added module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate labels Mar 25, 2026
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 25, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18488

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job, 2 Unrelated Failures

As of commit ea7648a with merge base 5e77594 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 25, 2026
@MartinPavella MartinPavella force-pushed the nxg01483/EIEX-762-create-the-aot-part-of-imxrt700cm-backend-combining-neutron-and-cortex-m branch 4 times, most recently from dda9ddd to b45f3f7 Compare March 25, 2026 14:12
training, the weights will be stored in the file.
:param train: Boolean indicating whether to train the model.
:param num_epochs: Number of epochs to use during training.
:param cortex_m_safe: There is a bug in the Cortex-M backend related to the `pad` operator. If this parameter is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fix this if it is something quick as opposed to introducing a new bypass logic? WDYT - CC @rascani since you were discussing this earlier this week in the context of NHWC.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

The issue with the pad operator is that calling the pad replacement op in

case exir_ops.edge.aten.constant_pad_nd.default:
op, args = self._get_pad_replacement(args, meta)
case _:
pass
result = super().call_operator(op, args, {}, meta)

would produce a contiguous output even when the input had the channels last dim order. I tried to look for the root cause but I didn't find it, so I opted for the bypass and I planned to report the issue after raising this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a fix for pad here: #18429

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
I have removed the workaround from our testing model.

from torchao.quantization.pt2e.quantizer.quantizer import Q_ANNOTATION_KEY


class IMXRT700CMQuantizer(Quantizer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quantizers are meant to be composible. Recipe is the right user facing abstraction to target an SoC with multiple different backends. Take a look at https://github.com/pytorch/executorch/blob/main/export/tests/test_target_recipes.py especially something like get_android_recipe to understand how two or more quantizers / partitioners are encapsulated and made to work together.

In your case, I imagine a target recipe for rt700 with neutron and cortex-m.

Copy link
Collaborator Author

@MartinPavella MartinPavella Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @digantdesai for the insights. I have looked into it, and recipes definitely look like the right way forward.
I analyzed the state in executorch:

  • To introduce an SoC recipe would require having recipes for both Neutron and Cortex-M backend (both missing). Alternatively the current Cortex-M and Neutron pipelines can be combined into a single recipe but from reuse perspective a base recipe for both backend seems better from my opinion. Our Neutron backend pipeline is currently implemented in

    def to_quantized_edge_program(

  • The Neutron pipeline contains some kernel registration functionality, as only it knows what NPU kernels are requires. This would probably require the creation of a new Stage type

    def to_quantized_edge_program(

    Or at least I didn't find any stage providing the functionality to just execute a function based on presence of an option.

  • The QAT appears to not be supported. The QuantizeStage explicitly states it performs post-training quantization. I see that the SourceTransformStage also enables quantization in some way, but it doesn't seem QAT is supported. So perhaps this would require another new Stage type (or modification on an existing stage).

Given this, enabling the RT700 Neutron+Cortex-M backend via a recipe requires changes in multiple backends, and this PR would end up quite large. Can we do this in multiple stages? Such as:

  1. Experimentally, continue with this early implementation introducing the option to combine Cortex-M and Neutron Backends for the i.MXRT700.
  2. Rework the current Neutron lowering pipeline to a recipe, and the same for the Cortex-M backend. Here we would potentially introduce new Stages.
  3. Rework the imxrt700cm lowering to a recipe
  4. Based on consequent discussion extend for QAT training.

For Cortex-M we need to sync with Arm too.
What is your opinion?

…ors.

The operator `dim_order_ops._clone_dim_order.default` uses the `kwargs` to determine the output dim order. Since the `kwargs` were always empty, this operator produced in incorrect result in the pass, which broke the rest of the model.
@MartinPavella MartinPavella force-pushed the nxg01483/EIEX-762-create-the-aot-part-of-imxrt700cm-backend-combining-neutron-and-cortex-m branch from b45f3f7 to ea7648a Compare March 26, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants